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MATCHED  SURVIVAL  ANALYSIS  (MSURV) 


1.  INTRODUCTION 

The  purpose  of  this  statistical  package  Is  to  analyze  matched 
observations  which  are  subject  to  arbitrary  right  censorship.  The  Input  data 
are  assumed  to  arise  from  a  matched  survival  study  In  which  cases  are  matched 
to  controls  with  respect  to  confounding  variables  suspected.  The  Input  data 
are  therefore  assumed  to  consist  of  match  sets  of  the  form  M  =  [(t^,  z^), 

(tz*  Az*  Zz).  ....  (t^,  A^^.  zj] 

where,  for  u=1.2.  ....m.  t^  are  survival,  or  response,  times. 

A^=  1.  Ift^ls  an  uncensored  observation 
=  0.  If  t  Is  censored 

V 

and 

z^  »  1,  If  the  subject  responding  at  t^  Is  a  case 
»  0.  If  the  subject  responding  at  t^  Is  a  control. 

Censorship  Is  assumed  to  be  due  to  loss-to-followup,  death  from  causes  other 
than  those  of  Interest,  or  survival  to  the  time  of  analysis. 

The  methods  In  this  program  are  designed  to  accommodate  match  sets,  of 
varying  length,  which  have  been  allocated  to  several  strata.  The  numbers  of 
case  and  of  control  subjects  are  also  allowed  to  vary  from  match  set  to  match 
set.  As  a  special  case,  this  program  analyzes  match  sets  arising  from  a 
1-to-R  matched  design.  In  which  each  case  Is  matched  to  R  controls,  Ral. 

In  this  documentation  we  describe  the  MSURV  features:  life  table  output 
for  cases  and  controls;  Kaplan-Meler  survival  curve  estimates,  and  confidence 


bands  for  cases  and  controls  with  plotting  option;  linear  rank  tests  for  com¬ 
paring  cases  and  controls;  and  a  test  for  comparing  cases  and  controls  with  a 
Life  Table  distribution.  The  Kaplan-Meier  survival  curve  estimate  is  pro¬ 
grammed  from  Kalbfleisch  and  Prentice  (Ref.  3:  p.  16,  eq.  1.10).  The  confi¬ 
dence  band  routine  is  programmed  from  Hall  and  Wellner  (Ref.  2).  The  linear 
rank  tests,  Michalek  and  Mihalko  (Ref.  4),  are  match  set  versions  of  the  log- 
rank  and  Wilcoxon  tests  for  censored  data.  The  test  for  comparing  a  study 
population  with  a  Life  Table  distribution  is  programmed  from  Gail  and  Ware 
(Ref.  1). 

2.  LIFE  TABLE  OUTPUT  AND  SURVIVAL  CURVE  ESTIMATES 

Letting  denote  the  distinct  ordered  uncensored  obser¬ 

vations,  with  t^Qj=0,  the  following  defined  numbers  are  printed  at  each 
^(i)*  ^*^*1*2. 

AT  RISK  =  the  number  of  subjects  alive  and  still  in  the  study  at 
time 

FAILURES  =  the  number  of  subjects  dying  at  t^^j 

CENSORED  =  the  number  of  subjects  censored  in  ) 

F(T)  =  the  value  of  the  Kaplan-Meier  estimate  at 

where 

F(t)  =  n  [(n.-d.)/n.], 

with  »  AT  RISK,  and  d^  -  FAILURES  at  t^^j. 

Two  such  tables  are  printed— one  for  the  cases,  and  one  for  the  controls. 

In  addition,  letting  denote  the  distinct  ordered  cen¬ 
soring  times,  with  the  following  defined  numbers  are,  as  an  option, 

printed  at  each  t'^jj,  j-0,1,  ...,L: 


AT  RISK  »  the  number  of  subjects  In  the  study  at  risk  of  being 
censored  at 

CENSORED  =  the  number  of  subjects  censored  at  t|^j^ 

FAILURES  =  the  number  of  subjects  dying  In 

H(T)  -  the  value  of  the  Kap1an»Me1er  estimate  of  the 
censoring  distribution  at  T=t|jj. 

Two  such  tables  are  printed  as  an  option— one  for  the  cases,  and  one  for  the 
controls. 


In  which  Tj^  Is  the  value  of  a  linear  rank  statistic  T  computed  on  a  match  set 
Mjj  of  the  form  M;  5^  Is  an  estimate  of  VAR(T);  and  Is  the  value  of 
computed  on  The  test  T  Is  taken  from  the  class  of  efficient  scores  pro¬ 

cedures  (Ref.  3:  p.  146,  eq.  6.6): 


k  "'l 

where,  with  ranked  times  of  the  k,  ksn, 
uncensored  observations  In  the  match  set  M;  and,  with  and  ""i 
Is  the  number  of  censored  observations  In  Zj^=l(0),  If  the  j^ 
censored  subject  In  [^(i  )»^(i+l)3  (control),  1-1,2,  ...,k,  j»l,2, 
...»m^;  and  z^^^-l(O),  If  the  subject  with  uncensored  response  time  t^^^  Is  a 


case  (control),  i»l,2 . k.  The  scores  c^  and  ,  i»l,2,  ...,k,  are  func¬ 

tions  of  the  observed  pattern  of  censoring  and  the  density  of  the  actual 
response  time  (Ref.  5:  eq.  17). 

Formally,  the  observed  response  time,  t.  Is  the  minimum  of  an  actual 
response  time,  t®,  and  a  censoring  time  u,  t=m1n(t“,u).  The  hypothesis  to  be 
tested: 

H  :  F,(ti,t| . t®)  Is  symmetric  In  Its  arguments, 

o  x  ^  ^  fn 

...tn, 

where  ^£(t®,t2®,  ...,y)  Is  the  joint  density  of  the  actual  response  times 
In  the  match  set,  l«l,2,  ...,n.  If  the  observations  within  match  sets 
are  assumed  Independent,  reduces  to 

. . 

where  Fiji(t®)  and  FQji(t®)  are  the  distribution  functions  of  the  case  and  con¬ 
trol  populations  for  M^,  i»l,2,  ...,n.  The  tests  of  the  form  n  are  designed 
to  test  or  H^.  Programmed  In  MSURV  are  four  special  cases  of  Q,  cor re- 

a2 

spending  to  two  versions  of  a  and  two  sets  of  scores,  c^  and  ,  1*1,2,  log- 
rank  and  Wllcoxon  scores. 

The  two  variance  estimates  are: 

Ia^p.(l-pJ,  (3) 

B  1^1  1  1  1 

where  a^*c^-C^  and  p^*n^^/n^,  with  being  the  number  of  cases  at  risk  and 
n^  being  the  total  number  of  subjects  at  risk  at  1-1,2,  ...,k,  and 

52  .  t2. 

U 

6 


rum 


(4) 


The  logrank  version  of  T  has  scores: 


1  .1  ^  .1 

“  I.f'ic  ~1»  C-j  =  y  n,f  ,  1=1,?,  ...,k, 

K=1  tc=l 


and  the  Wilcoxon  version  of  T  has  scores: 


T  n„  in,, 

c  =  1-?  i(  —1^,  c  =  1-  n  -JL,  1=1,? . k. 

’  k=1'^k'*‘^  ^  k=1*'k'*'^ 


The  binomial  variance  estimate  for  the  logrank  test  Is,  therefore. 


I  (1-Pi  3 

1=1  ^  ^ 


and  the  binomial  variance  estimate  for  the  Wilcoxon  test  is; 


^  cl  "X 


I  (  n^)  Pi(l-Pi). 
1*1  K»l'^K+l 


MSURV  calculates  the  following  two  logrank  and  two  Wilcoxon  tests  on  each 
data  set,  Mj,M2, 

LU  =  a  logrank  test  of  the  form  0  with  each  T  of  the  form  (eq,  2)  with 

scores  (eq.  5)  and  given  by  (eq.  4) 

LB  =  a  logrank  test  of  the  form  Q  with  each  T  of  the  form  (eq.  ?)  with 

scores  (eq.  5)  and  ^=0^,  given  by  (eq.  7) 

B 

Wll  »  a  Wilcoxon  test  of  the  form  0  with  each  T  of  the  form  (eq.  2)  with 

scores  (eq,  6)  and  given  by  (eq.  4) 

WB  ■  a  Wilcoxon  test  of  the  form  0  with  each  T  of  the  form  (eq.  2)  with 

scores  (eq.  6)  and  given  by  (eq,  8). 


‘jfc.  ' 


Each  test  1s  coded  to  accommodate  tied  data.  When  ties  among  the  uncen¬ 
sored  observations  are  present,  the  scores  are  first  calculated  as  If  no  ties 
existed,  and  then  the  average  of  the  scores  for  the  tied  observations  are 
used  for  each  tied  observation.  The  probability  that  a  standard  normal 
variable  vflll  exceed  the  observed  value  of  each  statistic,  the  p-value.  Is 
printed  for  each  statistic.  Data  with  cases  dying  before  the  controls  will 
produce  large  values  of  the  statistics.  Michalek  and  Mihalko  have  discussed 
these  and  other  linear  rank  tests  for  matched  designs  (Ref.  4). 

4.  THE  GAIL  AND  WARE  TEST 

The  following  procedure  Is  derived  from  M.  H.  Gall  and  J.  H.  Ware 
(Ref.  1).  The  Gall  and  Ware  procedure  tests  the  null  hypothesis  of  equality 
of  a  study  population  survival  distribution  and  a  known  survival  distribution 
given  in  the  form  of  a  life  table,  under  the  assumption  that  the  study  hazard 
function  Is  proportional  to  the  tabled  hazard  function.  The  alternatives  are 
that:  (a)  the  study  survival  time  is  stochastically  less  than  the  tabled 

values-  or  (b)  the  study  survival  time  Is  stochastically  greater  than  the 
tabled  values. 

The  Input  consists  of  two  data  sets:  a  sample  of  survival  data,  and  a 
published  life  table.  The  survival  data  Is  of  the  form: 

(Xj.ij) . (“b-V- 

where  x^^,  Is  the  age  at  death  or  censoring  of  the  subject,  A»l,2, 

. . . bN ,  and : 

Ajj  »  1,  If  Is  an  age  at  death 

=  n.  If  Is  an  age  at  censoring,  £sl,2 . N, 


where  censoring  may  be  due  to:  loss-to-fol lowup;  death  from  causes  other 
than  those  of  Interest;  or  survival  to  the  time  of  the  analysis. 


The  published  life  table  is  of  the  form: 


Age 

interval 

Number 

surviving 

A 

q 

mi 

A 

^0 

m2 

• 

• 

• 

A 

[a  ,a  1 
'■  1  i+r 

m 

A 

q. 

I 

I 

where,  for  j=0,l, 

<lj  *  P[<ly1ng  in  [a^.a 

and,  with  a  =0  and  a  =•, 

0  I+i 

Pj  *  ^j+l •••?!• 

Letting  denote  survival  time  in  the  life  table  population  and  denote 
population  and  denote  survival  time  In  the  study  population,  we  want  to 
test,  under  the  proportional  hazards  assumption: 

versus  Hj^:  P(Tj^t)>P(Tj>t), 
or  H2:  P(TL>t)<P(Tjj>t). 


j+j)  survival  up  to  age  a^] 


To  this  end,  we  define,  for  j=0,l, 


n^  =  number  of  study  subjects  entering 

w.  =  number  of  study  subjects  censored  in  [a-,a.  ) 

*J  J  \J 

d.  =  number  of  study  subjects  dying  in  [a-.a.  ) 

Sj  =  number  of  study  subjects  surviving  [aj»aj  +  J» 


Define  the  life-table  hazard-function  estimate  as: 


h(t)6  =  (-l/6)log(l-qj),  a^sUdj^^, 

where  we  have  assumed  that  aj^^-aj=6,  j=0,l,  The  expected  number, 

e  ,  of  study  deaths  in  [a  ,a  1  is  estimated  by: 
j  j  j  +  i 

®j  "  f'jMaj)6[l-h{aj)«/2-Wj/2nj.]. 

The  deviation  between  the  observed  number  of  study  deaths,  dj,  in  the 
interval  and  the  estimated  expected  number,  e^,  is  simply 


Dj  —  d j — e j ,  J“0,1,  «••,!> 


The  variance  of  Dj  is  estimated  by: 


r2,  .2 


®j  "  t^j^"j'‘^j^'^'*j^"j"'*j^^  ^®j^*  /4-h(aj)djWj6]/n 


The  statistic  is: 


{  z 


The  null  hypothesis,  is  rejected  in  favor  of  at  the  .05  level  of 

significance  when  GW:s-1.645;  is  rejected  in  favor  of  H,  when  GW21.645. 


The  respective  p-va1ues  are  printed,  MSURV  is  coded  both  to  compare  cases 
with  the  life  table  using  GW,  and  to  make  a  separate  comparison  of  the 
controls  with  the  life  table. 


1ST 

= 

S.  VARIABLE  DEFINITIONS 

stratum  number 

NCA 

= 

number  of  cases  in  match  set 

NCT 

= 

number  of  cases  and  controls  in  match  set 

AGE{I.N) 

s 

age  at  event  time,  subject  I,  match  set  N 

X(I,N) 

= 

event  time 

DELTA(I.N) 

s 

censoring  indicators 

NSTRAT 

s 

number  of  strata  to  be  used 

ISTRAT(I) 

- 

contains  the  numbers  of  the  strata  to  be  used 

STNAME 

- 

contains  the  names  of  the  strata 

NCAS(N) 

- 

number  of  cases  for  the  N^  match  set 

NCON(N) 

number  of  control  s  for  the  N^  match  set 

ICNT 

= 

number  of  match  sets  in  a  stratum 

KCAS 

s 

number  of  cases  in  a  stratum 

KCON 

s 

number  of  controls  in  a  stratum. 

6.  PROGRAM  FLOW  AND  INPUT 

The  main  program  loop  is  over  strata.  Event  times  and  censoring 
.  indicators  are  passed  to  subroutine  KMSURV  in  the  arrays  TS(I)  and  IDEL(I), 

1=1.2 . ICNT--first  for  the  cases,  and  then  for  the  controls  in  the 

stratum,  KMSURV  produces  tables  of  survival  and;  if  ICEN^O,  censoring  dis¬ 
tribution  estimates  for  cases  and  controls;  if  IHWBD^O,  a  table  of  confidence 
bounds  for  the  survival  curves  for  the  cases  and  controls.  If  I  PLOT  is  not 


11 


set  equal  to  zero,  survival  estimates  and  confidence  bands  are  plotted.  The 
various  statistics  and  their  associated  p-values  are  printed. 

The  following  constitutes  the  control  input  to  MSURV: 

TITLE  -  up  to  80  characters  describing  the  run  (20  A4) 

The  following  six  control  variables  are  read  unformatted: 


NSTRAT 

s 

number  of  strata  to  be  used 

ICEN 

- 

if  not  equal  to  zero,  estimates  the 
censoring  distribution 

IHWBD 

- 

if  not  equal  to  zero,  confidence  bands 
are  printed 

I  PLOT 

- 

if  not  equal  to  zero,  survival  curve 
estimates  and  confidence  bands  are 
plotted 

STBANE 

- 

the  names  of  the  strata,  8  characters 
each,  corresponding  to  the  numbers  in 

ISTRAT,  read  A8 

FMT 

- 

data  format. 

Data  are  read 

from 

unit  9  in  the  following  order  for  each  record; 

stratum  number;  NCA; 

NCT; 

age;  event  time;  and  censoring  indicator  for  NCA 

cases,  and  for  (NCT-NCA)  controls. 

7.  COMPUTATION  FORMULAS  FOR  LINEAR  RANK  PROCEOURES 
7.1  Main  Program 

The  linear  rank  procedures  in  MSURV  are  coded  (as  follows);  and  the  rows 
of  the  data  array  M  are  ranked  in  order  of  ascending  values  of  t  to  obtain: 

t  A  z 

(i)  (i)  (i) 

*(2)  ‘(2)  '(2) 

•  • 

^(m)  ^(m)  ^(m) 
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»T»’.  rm  1.'^ 'W.T'.ni Wiv '  V  ^ ^ ’AT V «L^ ’’.  ’ "-  *•  ■  ■ 

k  •  4 
■:■:• 

ii 

i 


The  following  array  is  then  formed: 


in  which,  for  i=l,2,  ,..,k: 


”i  ^13 
”2  ^2  ^23  ^24 


"k  \ 


6-'. 


k  =  number  of  distinct  death  times 
n^  =  number  of  subjects  at  risk  at 

d.  =  number  of  deaths  at  t. 


i3 


(1) 

sum  of  the  z's  for  the  deaths  at  t 


(1) 
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sum 


of  the  z's  for  the  censorings  in  [^(i ) 

The  vector  N*(N^,N2,  ...,Np)  is  calculated,  where  N^^nj,  N2»nj+1, 

and  . ”d j+d2“"2'*'^"‘^2»  ^d^-Kj^+l'^a . 

Np»n|^-dj^+l,  where  Dadj4<l2+  ...  +<1|^.  A  subroutine  is  called  to  calculate  the 

scores  c^  and  C^,  o=l,2,  ,..,D,  The  numbers  A^,  A2,  ....  A|^  are  calculated 

by: 

-1 

^  *  ^i  I  . 


where 


u«ni_i+i 


i 


^4*1  *^4  »  ...»k» 

’  j-1  ^ 
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with  Dp=0,  The  numerator  of  Q,  given  by  (eq.  1),  is  then  calculated  usings 


where 


The  binomial  variance  estimate  is  calculated  using: 

l( 

Oq  »  ^ )[(n^-d^)/(n^-l )]l [n^«l ), 

in  which 

l(n^»l)  ■  0  if  n^«l 

■  1  otherwise,  i»l,2 . k. 

Note  that  l(n^«l)*0  is  possible  only  «^en  1*lc, 


7.2  Logrank  Score  Subroutine 

The  input  is  a  D-dimensional  vector  of  decreasing  nonnegative  integers 
N*(N^,N2 . Nq).  The  scores  are  calculated  by: 

S  •  l-  i  C,  -  -  !  (1/N J.  -1,2 . 0. 

r*l  r*l 

7.3  Wilcoxon  Score  Subroutine 

The  input  is  a  D-dimensional  vector  of  decreasing  nonnegative  Integers 
N“(Nj,N2,  scores  are  calculated  by: 
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